test_show

Basic Overview

The data which was used for this analysis comes from: https://mixcr.com/mixcr/guides/milab-dna-multiplex-tcr/

This report was created to show the versatility of the pipeline and the report structure. Therefore, the associated binding data is completely virtually and was only created to show how it can be clustered in several plots with ExpoSeq.

Theory

Rarefraction curves

The x-axis of the plot shows the total number of sampled sequences, and the y-axis shows the total number of unique sequences.

The plot can be interpreted as follows:

  • Samples with higher sequencing depth will have more unique sequences at each point on the x-axis.
  • Samples with lower sequencing depth will have fewer unique sequences at each point on the x-axis.

This plot is useful for NGS because it can be used to assess the quality of the sequencing data and to determine whether the sequencing depth is sufficient for the desired application. For example, if you are trying to identify rare variants in a sample, you will need to have a high sequencing depth in order to be confident that the variants are real and not simply sequencing errors.

Here are some specific examples of how the plot could be used to interpret NGS data:

  1. If you are comparing two samples with different sequencing depths, you can use the plot to see which sample has more unique sequences at each point on the x-axis. This can help you to determine which sample is more likely to contain the variants of interest.
  2. If you are trying to determine whether the sequencing depth of a sample is sufficient for a particular application, you can use the plot to compare the sample to other samples that have been used successfully for the same application.
  3. If you are seeing a plateau in the number of unique sequences as the sequencing depth increases, this may indicate that you have reached a saturation point and that further sequencing will not yield much new information.

Alignment Quality Plot

The barplot shows the total sequenced reads and aligned reads for each sample. The total sequenced reads is the number of reads that were generated by the NGS sequencer. The aligned reads is the number of reads that were successfully mapped to the reference genome.

The plot shows that the total sequenced reads is higher than the aligned reads for all samples. This is because some of the reads may be of low quality or may not map to the reference genome.

Furthermore, it shows that the difference between the total sequenced reads and aligned reads is larger for some samples than others. This may be due to a number of factors, such as the quality of the DNA sample, the sequencing platform used, and the alignment parameters used.

Use of the barplot for NGS:

The barplot can be used to assess the quality of the NGS data and to determine whether the alignment rate is sufficient for the desired application.

Specific examples:

  1. If you are comparing two samples, you can use the plot to see which sample has a higher alignment rate. This can help you to determine which sample is more likely to contain the variants of interest.
  2. If you are trying to determine whether the alignment rate for a sample is sufficient for a particular application, you can compare the sample to other samples that have been used successfully for the same application.
  3. If you are seeing a low alignment rate for a sample, this may indicate that there is a problem with the DNA sample, the sequencing platform used, or the mixcr parameters used.

Heatmap based on Morosita-Horn Index

The heatmap shows a matrix which values are calculated based on the morosita horn index. This index captures the degree of identity between two samples which are composed of multiple values of different sizes. This is especially useful for comparing two samples which are composed of different numbers of clones and have a different number of aligned reads. In the context of this heatmap, identity means that two sequences have the same sequence length and the exact same amino acid sequence.

The heatmap has several possible applications during the quality control of the sequencing, which are: 1. identification of cross contamination between samples. This can be seen if a high degree of identity can be observed, although the samples were panned against different antigens. 2. Validation of the panning process. This can be seen if a high degree of identity can be observed between samples which were panned against the same antigen in different rounds.

Weaknesses: - the morosita horn becomes less accuracte if the number of elements (sequences) is very low in one of the samples

Alternatives:

The user can analyse the identity between the samples based on: - Jaccard Index - Sorensen-Dice Index

The content generated in this section was writen by Nils Hofmann. The author does not guarantee the correctness of the content.

Identity between samples

Identity based on Morosita Horn Index

Identity based on Jaccard Index

Identity based on Sorensen Index

Sequencing Quality of samples

General Sequencing Quality

Sample specific quality analysis

Quality analysis for GeneMind_1

Quality analysis for GeneMind_2

Quality analysis for GeneMind_3

Quality analysis for GeneMind_4

Quality analysis for GeneMind_5

Quality analysis for GeneMind_6

Quality analysis for GeneMind_7

Quality analysis for GeneMind_8

Sequence Logo Plots

Logo Plot for sequences with length 16 for GeneMind_1

Logo Plot for sequences with length 16 for GeneMind_2

Logo Plot for sequences with length 16 for GeneMind_3

Logo Plot for sequences with length 16 for GeneMind_4

Logo Plot for sequences with length 16 for GeneMind_5

Logo Plot for sequences with length 16 for GeneMind_6

Logo Plot for sequences with length 16 for GeneMind_7

Logo Plot for sequences with length 16 for GeneMind_8

Sequence Clusters

Theory

Clustering based on levenshtein distance

The Levenshtein distance is a metric that measures the similarity between two strings. It is calculated by counting the minimum number of edits (insertions, deletions, and substitutions) required to transform one string into the other.

For example, the Levenshtein distance between the strings “CAT” and “DOG” is 2, because we need to insert the letter “D” and substitute the letter “C” with the letter “G” to transform “CAT” into “DOG”.

The Levenshtein distance can be used to analyze peptide sequences in a variety of ways. For example, it can be used to:

  1. Identify similar CDR3 regions. This can be useful for identifying antibodies with similar antigen-binding specificities.
  2. Detect mutations in CDR3 regions. This can be useful for identifying mutations that may affect the affinity or specificity of an antibody.
  3. Cluster CDR3 regions into groups. This can be useful for identifying groups of CDR3 regions with similar antigen-binding specificities.

The connected components in the plot (shown by a a network of lines) are groups of CDR3 sequences that are similar to each other, based on a threshold for a levenshtein distance (default = 2). The size of each circle in the plot represents the number of clones for the corresponding CDR3 sequence relative to the clone counts of other sequences.

Overall, the plot can be used to get a general overview of the diversity and similarity of the CDR3 sequences in the library.

Furthermore, a table is given which has the sorted sequences based on the clone count. That means the top 10 sequences represent the sequences with the highest clone fraction in that sample. This table, coming from a report can be used to identify specificity for a certain antigen based on a density of these sequences in certain clusters.

Disadvantages of the levenshtein distance

  • Only captures global similarity of sequences and does not focues on specific regions
  • Does not take any chemical properties or structural properties into account

Sequence Cluster Dendrogram

The dendrogram you provided is a visualization of the similarity between a set of CDR3 heavy chain sequences. The sequences are clustered together based on their Levenshtein distance, which is described in the previous section.

The dendrogram can be used to understand the diversity of the CDR3 sequences in a sample. For example, if the dendrogram shows that most of the sequences are clustered together in a few large clusters, this suggests that there is a relatively low level of diversity in the sample. On the other hand, if the dendrogram shows that the sequences are spread out in many small clusters, this suggests that there is a high level of diversity in the sample.

The dendrogram can also be used to identify groups of similar sequences. For example, if we want to identify a group of sequences that are likely to have similar antigen-binding specificities, we can look for a cluster of sequences that are close to each other on the dendrogram.

Clustering based on t-SNE

Addressing the limitations of using the levenshtein distance for clustering hcdr3 sequences, a plot which shows the sequences and their whole sequence relation was created to enable a more extensive analysis of the sequences and their binding. To be able to have the sequences as vectors the SGT embedding was applied.

This embedding does not capture any chemical properties of the amino acid strand but instead tries to recognize the characteristic relative position of letters within a sequence which enables an identification of patterns between sequences of different lengths. This addresses the variability of the cdr3 sequences and enables to capture the similarity of sequences. Therefore, the sgt embedder was initialized using the package sgt where the parameter length sensitive was set to True and the parameter kappa was assigned with 1 . After the initialization, the embedding creates an output of 400 dimensions per sequence which were reduced to 80 dimensions by using principal component analysis (PCA).

Subsequently t-SNE was implemented to show a representative arrangement of the 80 dimensions in the two dimensional space because a further dimension reduction to only two dimensions would insufficiently describe the data globally by using PCA. That is why t-SNE was introduced to be able to show a representative arrangement of the points without losing too much of its information.

The plot can be used in variety of ways, such as: 1. Identify similar CDR3 regions 2. learn about the differences between samples 3. identify specific sequence differences for samples which were panned against different antigens.

The content generated in this section was writen by Nils Hofmann. The author does not guarantee the correctness of the content.

Levenshtein distance based

Relation of all samples

Sequence Similarity based on Levenshtein Distance of GeneMind_1

Unnamed: 0 Cluster No. Sequences
0 0 33 CASSRLAGGTDTQYF
1 1 1 *GGADGLTF
2 2 2 CASSLGQYTGELFF
3 3 3 CAVTDQAGTALIF
4 4 4 CASSSGTEQFF
5 5 5 CASSPRGGANVLTF
6 6 6 CAVRDSGYSTLTF
7 7 7 CAVVTGGFKTIF
8 8 8 CASSPTTGFAGELFF
9 9 9 CASSLEAGSYNEQFF

Sequence Similarity based on Levenshtein Distance of GeneMind_2

Unnamed: 0 Cluster No. Sequences
0 0 0 CASSRLAGGTDTQYF
1 1 1 CAVTDQAGTALIF
2 2 2 CASSPRGGANVLTF
3 3 3 CAVSEIGSGNTGKLIF
4 4 4 CAVRDSGYSTLTF
5 5 5 CASSSGTEQFF
6 6 6 CASSLEAGSYNEQFF
7 7 7 CAVVTGGFKTIF
8 8 8 CVVSLSGGSYIPTF
9 9 9 CASSWQGSPDTQYF

Sequence Similarity based on Levenshtein Distance of GeneMind_3

Unnamed: 0 Cluster No. Sequences
0 0 0 CASSYRGTGELFF
1 1 1 CALALYNQGGKLIF
2 2 2 CAMRGYYGNNRLAF
3 3 3 CASSKWTGELFF
4 4 4 CAASMSKAAGNKLTF
5 5 5 CAVRAPYGGATNKLIF
6 6 16 CAALWGGKLIF
7 7 1 CAVKVGNQGGKLIF
8 8 7 CAVNSGNTPLVF
9 9 0 CASSYSMNTEAFF

Sequence Similarity based on Levenshtein Distance of GeneMind_4

Unnamed: 0 Cluster No. Sequences
0 0 0 CASSYRGTGELFF
1 1 1 CASSLEGTGIGNTIYF
2 2 2 CALALYNQGGKLIF
3 3 0 CASSKWTGELFF
4 4 2 CAALWGGKLIF
5 5 0 CASSYSMNTEAFF
6 6 2 CAVKVGNQGGKLIF
7 7 4 CASSLEGEQFF
8 8 5 CVVSDRGSTLGRLYF
9 9 0 CASSTRGTGELFF

Sequence Similarity based on Levenshtein Distance of GeneMind_5

Unnamed: 0 Cluster No. Sequences
0 0 0 CASSLI*GTEAFF
1 1 1 CA*ETSGSRLTF
2 2 2 CAVMDSSYKLIF
3 3 3 CAARVSGGYNKLIF
4 4 4 CASSLGQPSYNEQFF
5 5 5 CASSEVLSEKLFF
6 6 6 CAVSDGTGGFKTIF
7 7 7 CASSQGRGALYNEQFF
8 8 8 CAESMTTDSWGKLQF
9 9 6 CAVDGGFKTIF

Sequence Similarity based on Levenshtein Distance of GeneMind_6

Unnamed: 0 Cluster No. Sequences
0 0 0 CASSPGTGTYGYTF
1 1 1 CASSLI*GTEAFF
2 2 2 CA*ETSGSRLTF
3 3 3 CAVMDSSYKLIF
4 4 4 CAVRSNFGNEKLTF
5 5 5 CAVSNAGNMLTF
6 6 6 CAARVSGGYNKLIF
7 7 7 CASSLVLEETQYF
8 8 8 CASSLGLNSGNTIYF
9 9 9 CASSLGQPSYNEQFF

Sequence Similarity based on Levenshtein Distance of GeneMind_7

Unnamed: 0 Cluster No. Sequences
0 0 0 CAVNLDTGNQFYF
1 1 1 CASYYGGSQGNLIF
2 2 2 CASSFGGNQPQHF
3 3 3 CASSLGASGGADTQYF
4 4 4 CAVILPTGGFKTIF
5 5 5 CASSEGPSSGNTIYF
6 6 6 CAVYNQGGKLIF
7 7 7 CASSYRPSSYNEQFF
8 8 4 CAVDTGGFKTIF
9 9 0 CAGVPLDTGNQFYF

Sequence Similarity based on Levenshtein Distance of GeneMind_8

Unnamed: 0 Cluster No. Sequences
0 0 0 CASSYQGATEAFF
1 1 1 CAVNLDTGNQFYF
2 2 2 CASSFGGNQPQHF
3 3 3 CASYYGGSQGNLIF
4 4 4 CASSLGASGGADTQYF
5 5 5 CAVILPTGGFKTIF
6 6 6 CAVRKPGGSYIPTF
7 7 7 CAVIIAGNMLTF
8 8 8 CAVYNQGGKLIF
9 9 9 CASSEGPSSGNTIYF

Sequence Similarity between samples

SGT Embedder

Global Sequence similarity for GeneMind_1

Global Sequence similarity for GeneMind_2

Global Sequence similarity for GeneMind_3

Global Sequence similarity for GeneMind_4

Global Sequence similarity for GeneMind_5

Global Sequence similarity for GeneMind_6

Global Sequence similarity for GeneMind_7

Global Sequence similarity for GeneMind_8

ProtBert Embedder

Global Sequence similarity for GeneMind_1

Global Sequence similarity for GeneMind_2

Global Sequence similarity for GeneMind_3

Global Sequence similarity for GeneMind_4

Global Sequence similarity for GeneMind_5

Global Sequence similarity for GeneMind_6

Global Sequence similarity for GeneMind_7

Global Sequence similarity for GeneMind_8

T5 Embedder

Global Sequence similarity for GeneMind_1

Cluster Antigens

Theory

Sequence embedding and T-SNE - REPORT

Additionally to the plot, a report was created which contains the results of the t-SNE embedding. This report contains the following columns: - tsne1: Contains the coordinates for the first dimension (x-axis) of the plot - tsne2: Contains the coordinates for the second dimension (y-axis) of the plot - experiments_factorized: Contains a factorized form for the experiment column. You can ignore this column - experiments_string: Contains the sample name from the NGS data, used for this plot. - binding: Contains the binding value which was taken from the binding data with the corresponding antigen. NGS sequences have the value 0 - sequences: Amino Acid sequence for the given coordinates and binding value - sequence_id: Can be used to identify the position of the sequence in the right plot with the numbers

The plot can be used to identify sequences which could have a potential high binding based on the values of neighboring sequences. Further, cluster may be identified which have a high local binding and sequence attributes may be derived based on this.

Levenshtein Distance based Clustering - REPORT

The Report for this plots can be used to localize sequence in the network plots. The report contains the following columns: - Clusters_[YOUR_SAMPLENAME]No: Contains the cluster number for the given sequence. This is useful if you want to see the sequences which are localized in the same cluster and thus have a high sequence similarity. - Sequences[YOUR_SAMPLE_NAME]: The sequences for the corresponding cluster number

This plot can be particular useful if you look for potential high binders only based on point mutations.

The content generated in this section was writen by Nils Hofmann. The author does not guarantee the correctness of the content.

Cluster with embedding

Sequence embedding of GeneMind_1

tsne1 tsne2 binding sequences sequence_id
8 -2.81362 25.5781 1.08003e+06 CASSSGTEQFF 7
10 -5.6064 3.2838 700032 CASSPRGGANVLTF 9
3 -1.01123 4.71494 509403 CASSPGTGTYGYTF 2
9 -2.27191 13.2591 393000 CASSLEAGSYNEQFF 8
4 9.19538 -15.4826 140030 CAVMDSSYKLIF 3
5 -4.12407 12.5336 60300 CASSLEGEQFF 4
6 3.84862 -17.4211 40300 CAALWGGKLIF 5
11 3.12499 -18.5399 30403 CAVIIAGNMLTF 10
2 1.48415 -11.4624 10450 CAVYNQGGKLIF 1
7 8.86475 -6.86168 10030 CAASMSKAAGNKLTF 6

Sequence embedding of GeneMind_2

tsne1 tsne2 binding sequences sequence_id
8 13.3534 13.4245 1.08003e+06 CASSSGTEQFF 7
10 -3.55645 8.66756 700032 CASSPRGGANVLTF 9
3 0.449737 3.59158 509403 CASSPGTGTYGYTF 2
9 0.927137 13.5509 393000 CASSLEAGSYNEQFF 8
4 -16.7241 -2.28946 140030 CAVMDSSYKLIF 3
5 2.97523 11.1632 60300 CASSLEGEQFF 4
6 9.55936 -13.9243 40300 CAALWGGKLIF 5
11 12.7672 -13.9412 30403 CAVIIAGNMLTF 10
2 5.34285 -19.2417 10450 CAVYNQGGKLIF 1
7 -8.19522 -5.17954 10030 CAASMSKAAGNKLTF 6

Sequence embedding of GeneMind_3

tsne1 tsne2 binding sequences sequence_id
2 -11.6365 -6.71831 1.604e+06 CASSRLAGGTDTQYF 1
9 -9.73683 -12.9966 1.08003e+06 CASSSGTEQFF 8
11 -8.92439 -3.50094 700032 CASSPRGGANVLTF 10
4 -6.99139 14.0796 509403 CASSPGTGTYGYTF 3
10 -15.659 -1.47393 393000 CASSLEAGSYNEQFF 9
5 10.0569 5.19933 140030 CAVMDSSYKLIF 4
6 -15.6888 -0.332381 60300 CASSLEGEQFF 5
7 10.459 8.73661 40300 CAALWGGKLIF 6
12 12.7342 9.93005 30403 CAVIIAGNMLTF 11
3 9.76606 0.72657 10450 CAVYNQGGKLIF 2

Sequence embedding of GeneMind_4

tsne1 tsne2 binding sequences sequence_id
2 -13.158 3.37895 1.604e+06 CASSRLAGGTDTQYF 1
9 -13.9523 5.37201 1.08003e+06 CASSSGTEQFF 8
11 -4.92456 0.595963 700032 CASSPRGGANVLTF 10
4 2.48044 -13.6071 509403 CASSPGTGTYGYTF 3
10 -9.97752 -2.91278 393000 CASSLEAGSYNEQFF 9
5 13.3271 17.2177 140030 CAVMDSSYKLIF 4
6 -8.86293 -2.32666 60300 CASSLEGEQFF 5
7 13.2121 2.03496 40300 CAALWGGKLIF 6
12 -5.37406 -17.3975 30403 CAVIIAGNMLTF 11
3 6.55099 7.56039 10450 CAVYNQGGKLIF 2

Sequence embedding of GeneMind_5

tsne1 tsne2 binding sequences sequence_id
2 11.1622 -6.68973 1.604e+06 CASSRLAGGTDTQYF 1
9 12.7642 -11.8674 1.08003e+06 CASSSGTEQFF 8
11 6.84346 -2.93172 700032 CASSPRGGANVLTF 10
4 4.60303 -11.0927 509403 CASSPGTGTYGYTF 3
10 15.8303 1.51701 393000 CASSLEAGSYNEQFF 9
5 -13.9576 13.1687 140030 CAVMDSSYKLIF 4
6 11.4019 6.30659 60300 CASSLEGEQFF 5
7 -18.5941 -5.7488 40300 CAALWGGKLIF 6
12 -19.2897 -7.06993 30403 CAVIIAGNMLTF 11
3 -8.26526 -2.59342 10450 CAVYNQGGKLIF 2

Sequence embedding of GeneMind_6

tsne1 tsne2 binding sequences sequence_id
2 4.19497 3.54675 1.604e+06 CASSRLAGGTDTQYF 1
9 18.977 4.64954 1.08003e+06 CASSSGTEQFF 8
11 10.0321 -1.39943 700032 CASSPRGGANVLTF 10
4 11.002 -8.21706 509403 CASSPGTGTYGYTF 3
10 5.29697 10.7443 393000 CASSLEAGSYNEQFF 9
5 -4.00156 -20.6722 140030 CAVMDSSYKLIF 4
6 6.96238 8.75441 60300 CASSLEGEQFF 5
7 -13.2817 -11.9341 40300 CAALWGGKLIF 6
12 -14.0234 -12.4237 30403 CAVIIAGNMLTF 11
3 -6.24732 -11.7237 10450 CAVYNQGGKLIF 2

Sequence embedding of GeneMind_7

tsne1 tsne2 binding sequences sequence_id
2 -12.6219 6.46699 1.604e+06 CASSRLAGGTDTQYF 1
9 6.0008 -15.546 1.08003e+06 CASSSGTEQFF 8
11 -5.5028 0.312424 700032 CASSPRGGANVLTF 10
4 2.94409 14.5411 509403 CASSPGTGTYGYTF 3
10 -10.909 -1.27156 393000 CASSLEAGSYNEQFF 9
5 -1.62621 17.9219 140030 CAVMDSSYKLIF 4
6 -10.7365 -3.6284 60300 CASSLEGEQFF 5
7 9.83293 -6.33807 40300 CAALWGGKLIF 6
12 8.73087 -7.90391 30403 CAVIIAGNMLTF 11
3 11.6667 -1.64666 10450 CAVYNQGGKLIF 2

Sequence embedding of GeneMind_8

tsne1 tsne2 binding sequences sequence_id
2 -8.85695 6.99166 1.604e+06 CASSRLAGGTDTQYF 1
9 -10.1774 5.85169 1.08003e+06 CASSSGTEQFF 8
11 -5.46811 5.85256 700032 CASSPRGGANVLTF 10
4 13.93 0.916507 509403 CASSPGTGTYGYTF 3
10 1.40869 16.1609 393000 CASSLEAGSYNEQFF 9
5 1.428 -22.1835 140030 CAVMDSSYKLIF 4
6 0.711827 15.2059 60300 CASSLEGEQFF 5
7 8.11375 -3.02137 40300 CAALWGGKLIF 6
12 7.54816 -1.63007 30403 CAVIIAGNMLTF 11
3 0.668597 -14.647 10450 CAVYNQGGKLIF 2

Cluster binding data based on levenshtein distance

LS-Distance cluster for GeneMind_1

Unnamed: 0 Clusters_GeneMind_1_No. Sequences_GeneMind_1
0 0 27 CASSRLAGGTDTQYF
1 1 1 *GGADGLTF
2 2 2 CASSLGQYTGELFF
3 3 3 CAVTDQAGTALIF
4 4 4 CASSSGTEQFF
5 5 5 CASSPRGGANVLTF
6 6 6 CAVRDSGYSTLTF
7 7 7 CAVVTGGFKTIF
8 8 8 CASSPTTGFAGELFF
9 9 9 CASSLEAGSYNEQFF

LS-Distance cluster for GeneMind_2

Unnamed: 0 Clusters_GeneMind_2_No. Sequences_GeneMind_2
0 0 0 CASSRLAGGTDTQYF
1 1 1 CAVTDQAGTALIF
2 2 2 CASSPRGGANVLTF
3 3 3 CAVSEIGSGNTGKLIF
4 4 4 CAVRDSGYSTLTF
5 5 5 CASSSGTEQFF
6 6 6 CASSLEAGSYNEQFF
7 7 7 CAVVTGGFKTIF
8 8 8 CVVSLSGGSYIPTF
9 9 9 CASSWQGSPDTQYF

LS-Distance cluster for GeneMind_3

Unnamed: 0 Clusters_GeneMind_3_No. Sequences_GeneMind_3
0 0 0 CASSYRGTGELFF
1 1 1 CALALYNQGGKLIF
2 2 2 CAMRGYYGNNRLAF
3 3 3 CASSKWTGELFF
4 4 4 CAASMSKAAGNKLTF
5 5 5 CAVRAPYGGATNKLIF
6 6 16 CAALWGGKLIF
7 7 1 CAVKVGNQGGKLIF
8 8 7 CAVNSGNTPLVF
9 9 0 CASSYSMNTEAFF

LS-Distance cluster for GeneMind_4

Unnamed: 0 Clusters_GeneMind_4_No. Sequences_GeneMind_4
0 0 3 CASSYRGTGELFF
1 1 1 CASSLEGTGIGNTIYF
2 2 2 CALALYNQGGKLIF
3 3 3 CASSKWTGELFF
4 4 2 CAALWGGKLIF
5 5 3 CASSYSMNTEAFF
6 6 2 CAVKVGNQGGKLIF
7 7 4 CASSLEGEQFF
8 8 5 CVVSDRGSTLGRLYF
9 9 3 CASSTRGTGELFF

LS-Distance cluster for GeneMind_5

Unnamed: 0 Clusters_GeneMind_5_No. Sequences_GeneMind_5
0 0 0 CASSLI*GTEAFF
1 1 1 CA*ETSGSRLTF
2 2 2 CAVMDSSYKLIF
3 3 3 CAARVSGGYNKLIF
4 4 4 CASSLGQPSYNEQFF
5 5 5 CASSEVLSEKLFF
6 6 6 CAVSDGTGGFKTIF
7 7 7 CASSQGRGALYNEQFF
8 8 8 CAESMTTDSWGKLQF
9 9 6 CAVDGGFKTIF

LS-Distance cluster for GeneMind_6

Unnamed: 0 Clusters_GeneMind_6_No. Sequences_GeneMind_6
0 0 0 CASSPGTGTYGYTF
1 1 1 CASSLI*GTEAFF
2 2 2 CA*ETSGSRLTF
3 3 3 CAVMDSSYKLIF
4 4 4 CAVRSNFGNEKLTF
5 5 5 CAVSNAGNMLTF
6 6 6 CAARVSGGYNKLIF
7 7 7 CASSLVLEETQYF
8 8 8 CASSLGLNSGNTIYF
9 9 9 CASSLGQPSYNEQFF

LS-Distance cluster for GeneMind_7

Unnamed: 0 Clusters_GeneMind_7_No. Sequences_GeneMind_7
0 0 0 CAVNLDTGNQFYF
1 1 1 CASYYGGSQGNLIF
2 2 2 CASSFGGNQPQHF
3 3 3 CASSLGASGGADTQYF
4 4 4 CAVILPTGGFKTIF
5 5 5 CASSEGPSSGNTIYF
6 6 6 CAVYNQGGKLIF
7 7 7 CASSYRPSSYNEQFF
8 8 4 CAVDTGGFKTIF
9 9 0 CAGVPLDTGNQFYF

LS-Distance cluster for GeneMind_8

Unnamed: 0 Clusters_GeneMind_8_No. Sequences_GeneMind_8
0 0 0 CASSYQGATEAFF
1 1 1 CAVNLDTGNQFYF
2 2 2 CASSFGGNQPQHF
3 3 3 CASYYGGSQGNLIF
4 4 4 CASSLGASGGADTQYF
5 5 5 CAVILPTGGFKTIF
6 6 6 CAVRKPGGSYIPTF
7 7 7 CAVIIAGNMLTF
8 8 8 CAVYNQGGKLIF
9 9 9 CASSEGPSSGNTIYF

Mixcr Plots

J-usage

V-usage

Metrics - TRA

Metrics - TRB

Metrics - TRD